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Abstract 

Theoretical computer scientists have been debating the role of oracles since the 1970's. This pa- 
per illustrates both that oracles can give us nontrivial insights about the barrier problems in circuit 
complexity, and that they need not prevent us from trying to solve those problems. 

First, we give an oracle relative to which PP has linear-sized circuits, by proving a new lower bound 
for perceptrons and low-degree threshold polynomials. This oracle settles a longstanding open question, 
and generalizes earlier results due to Beigel and to Buhrman, Fortnow, and Thierauf. More importantly, 
it implies the first nonrelativizing separation of "traditional" complexity classes, as opposed to interactive 
proof classes such as MIP and MAexp- For Vinodchandran showed, by a nonrelativizing argument, that 
PP does not have circuits of size n k for any fixed k. We present an alternative proof of this fact, which 
shows that PP does not even have quantum circuits of size n k with quantum advice. To our knowledge, 
this is the first nontrivial lower bound on quantum circuit size. 

Second, we study a beautiful algorithm of Bshouty et al. for learning Boolean circuits in ZPP NP . 
We show that the NP queries in this algorithm cannot be parallelized by any relativizing technique, 
by giving an oracle relative to which ZPP^ P and even BPP|| P have linear-size circuits. On the other 
hand, we also show that the NP queries could be parallelized if P = NP. Thus, classes such as ZPP^ P 
inhabit a "twilight zone," where we need to distinguish between relativizing and black-box techniques. 
Our results on this subject have implications for computational learning theory as well as for the circuit 
minimization problem. 

1 Introduction 

It is often lamented that, half a century after Shannon's insight 30 that almost all Boolean functions require 
exponential-size circuits, there is still no explicit function for which we can prove even a superlinear lower 
bound. Yet whether this lament is justified depends on what we mean by "explicit." For in 1982, Kannan 
18 did show that for every constant k, there exists a language in Z?? (the second level of the polynomial 
hierarchy) that does not have circuits of size n k . His proof used the oldest trick in the book: diagonalization, 
defined broadly as any method for simulating all machines in one class by a single machine in another. In 
some sense, diagonalization is still the only method we know that zeroes in on a specific property of the 
function being lower-bounded, and thereby escapes the jaws of Razborov and Rudich |27| . 

But can we generalize Kannan's theorem to other complexity classes? A decade ago, Bshouty et al. [S] 
discovered an algorithm to learn Boolean circuits in ZPP NP (that is, probabilistic polynomial time with NP 
oracle). As noticed by Kobler and Watanabe [201) the existence of this algorithm implies that ZPP NP itself 
cannot have circuits of size n k for any fc. 1 

So our task as lowerboundsmen and lowerboundswomcn seems straightforward: namely, to find increas- 
ingly powerful algorithms for learning Boolean circuits, which can then be turned around to yield increasingly 
powerful circuit lower bounds. But when we try to do this, we quickly run into the brick wall of relativization. 

* Email: aaronson@ias.edu. This research was done while the author was a postdoc at the Institute for Advanced Study in 
Princeton, supported by an NSF grant. 

1 For Bshouty et al.'s algorithm implies the following improvement to the celebrated Karp-Lipton theorem 1191 : if NP C P/poly 
then PH collapses to ZPP NP . There are then two cases: if NP (2 P/poly, then certainly ZPP NP £ P/poly as well and we are 
done. On the other hand, if NP C P/poly, then ZPP NP = PH, but we already know from Kannan's theorem that PH does not 
have circuits of size n k . Indeed, we can repeat this argument for the class Sf, which Cai 1111 showed is contained in ZPP NP . 
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Just as Baker, Gill, and Solovay |B] gave a relativized world where P = NP, so Wilson 0U] gave relativized 
worlds where NP and P NP have linear-size circuits. Since the results of Kannan [T|j and Bshouty et al. [S] 
relativize, this suggests that new techniques will be needed to make further progress. 

Yet attitudes toward relativization vary greatly within our community. Some computer scientists ridicule 
oracle results as elaborate formalizations of the obvious — apparently believing that (1) there exist relativized 
worlds where just about anything is true, (2) the creation of such worlds is a routine exercise, (3) the only 
conjectures ruled out by oracle results are trivially false ones, which no serious researcher would waste time 
trying to prove, and (4) nonrclativizing results such as IP = PSPACE render oracles irrelevant anyway. 
At the other extreme, some computer scientists see oracle results not as a spur to create nonrclativizing 
techniques or as a guide to where such techniques might be needed, but as an excuse to abandon hope. 

This paper will offer new counterexamples to both of these views, in the context of circuit lower bounds. 
We focus on two related topics: first, the classical and quantum circuit complexity of PP; and second, the 
learnability of Boolean circuits using parallel NP queries. 

1.1 On PP and Quantum Circuits 

In Sectional we give an oracle relative to which PP has linear-size circuits. Here PP is the class of languages 
accepted by a nondeterministic polynomial-time Turing machine that accepts and if and only if the majority 
of its paths do. Our construction also yields an oracle relative to which PEXP (the exponential-time version 
of PP) has polynomial-size circuits, and indeed P NP = 0P = PEXP. This settles several questions that were 
open for years, 2 and subsumes at least three previous results: 

(1) that of Beigel [7J giving an oracle relative to which P NP <f_ PP (since clearly P NP = PEXP implies 
P NP (f_ PP), 

(2) that of Buhrman, Fortnow, and Thierauf JH| giving an oracle relative to which MAexp C P/poly, and 

(3) that of Buhrman et al. [§] giving an oracle relative to which P NP = NEXP. 

Note that our result is nearly optimal, since Toda's theorem yields a relativizing proof that P pp and 
even BP • PP do not have circuits of any fixed polynomial size. 

Our proof first represents each PP machine by a low-degree multilinear polynomial, whose variables are 
the bits of the oracle string. It then combines these polynomials into a single polynomial called Q. The key 
fact is that, if there are no variables left "unmonitored" by the component polynomials, then we can modify 
the oracle in a way that increases Q. Since Q can only increase a finite number of times, it follows that we 
will eventually win our "war of attrition" against the polynomials, at which point we can simply write down 
what each machine does in an unmonitored part of the oracle string. The main novelty of the proof lies in 
how we combine the polynomials to create Q. 

We can state our result alternatively in terms of perceptrons , also known as threshold-of- AND circuits 
or polynomial threshold functions. Call a perceptron "small" if it has size 2 n ° (1) , order n"' 1 ', and weights 
in { — 1, 1}. Also, given an n-bit string x\ . . . x n , recall that the ODDMAXBIT problem is to decide whether 
the maximum i such that Xi = 1 is even or odd, promised that such an i exists. Then Beigel [7] showed that 
no small perceptron can solve ODDMAXBIT. What we show is a strong generalization of Beigel's theorem: 
for any k = small perceptrons, there exists a "problem set" consisting of k ODDMAXBIT instances, 
such that for every i, the i th perceptron will get the i th problem wrong even if it can examine the whole 
problem set. Previously this had been open even for k = 2. 

But the real motivation for our result is that in the unrelativized world, PP is known not to have linear- 
size circuits. Indeed, Vinodchandran showed that for every k, there exists a language in PP that does not 
have circuits of size n k . As a consequence, we obtain the first nonrelativizing separation that does not involve 
artificial classes or classes defined using interactive proofs. There have been nonrelativizing separations in 
the past, but most of them have followed easily from the collapse of interactive proof classes: for example, 

2 Lance Fortnow, personal communication. 
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NP ± MIP from MIP = NEXP 0, and IP £ SIZE (n k ) from IP = PSPACE The one exception was the 

result of Buhrman, Fortnow, and Thierauf JU] that MAexp <£ P/poly, where MAexp is the exponential-time 
version of MA. However, the class MAexp exists for the specific purpose of not being contained in P/poly, 
and the resulting separation does not scale down below NEXP, to show (for example) that MA does not have 
linear-size circuits. By contrast, PP is one of the most natural complexity classes there is. That is why, in 
our opinion, our result adds some heft to the idea that currently-understood nonrelativizing techniques can 
lead to progress on the fundamental questions of complexity theory. 

The actual lower bound of Vinodchandran follows easily from three well-known results: the LFKN 
interactive protocol for the permanent Toda's theorem [35], and Kannan's theorem |18|. 3 In Section|21 
we present an alternative, more self-contained proof, which does not go through Toda's theorem. As a bonus, 
our proof also shows that PP does not have quantum circuits of size n k for any k. Indeed, this remains true 
even if the quantum circuits are given "quantum advice states" on n k qubits, which might require exponential 
time to prepare. One part of our proof is a "quantum Karp-Lipton theorem," which states that if PP has 
polynomial-size quantum circuits, then the "counting hierarchy" (consisting of PP, PP PP , PP PP , and so 
on) collapses to QMA, the quantum analogue of NP. By analogy to the classical nonrelativizing separation 
of Buhrman, Fortnow, and Thierauf JU|, we also show that QMA E xp, the exponential-time version of QMA, 
is not contained in BQP/qpoly. Indeed, QMA E xp requires quantum circuits of at least "half-exponential" 
size, meaning size / (n) where / (/ (n)) grows exponentially. 4 

While none of the results in Section [3] are really difficult, we include them here for three reasons: 

(1) So far as we know, the only existing lower bounds for arbitrary quantum circuits are due to Nishimura 
and Yamakami who showed (among other things) that EESPACE <f_ BQP/qpoly. 5 We felt it 
worthwhile to point out that much better bounds are possible. 

(2) When it comes to understanding the limitations of quantum computers, our knowledge to date consists 
almost entirely of oracle lower bounds. Many researchers have told us that they would much prefer 
to see some unrelativized results, or at the very least conditional statements — for example, "if NP- 
completc problems are solvable in quantum polynomial time, then the polynomial hierarchy collapses." 
The results of Section UJ represent a first step in that direction. 

(3) Recently Aaronson [2J gave a new characterization of PP, as the class of problems solvable in quantum 
polynomial time, given the ability to postselect (that is, to discard all runs of the computation in 
which a given measurement result does not occur). If we replace "quantum" by "randomized" in this 
definition, then we obtain a classical complexity class called BPP pat h, which was introduced by Han, 
Hemaspaandra, and Thierauf |16| . So the fact that we can prove a quantum circuit lower bound for PP 
implies one of two things: either that (i) we can prove a nonrelativizing quantum separation theorem, 
but not the classical analogue of the same theorem, or that (ii) we should be able to prove classical 
circuit lower bound for BPP pat h. As we will see later, the latter possibility would be a significant 
breakthrough. 

1.2 On Parallel NP Queries and Black-Box Learning 

In a second part of the paper, we study the learning algorithm of Bshouty et al. [S] mentioned earlier. Given 
a Boolean function / that is promised to have a polynomial-size circuit, this algorithm finds such a circuit 
in the class ZPP NP : that is, zero-error probabilistic polynomial time with NP oracle with oracle for /. One 
of the most basic questions about this algorithm is whether the NP queries can be made nonadaptive. For 
if so, then we immediately obtain a new circuit lower bound: namely that ZPP] S | IP (that is, ZPP with parallel 

^Suppose by contradiction that PP has circuits of size n k . Then P* p C P/poly, and therefore MA = PP = P* p by a result 
of LFKN [22J (this is the only part of the proof that fails to relativize). Now MA C C P# p by Toda's theorem [35], so 
Eg = PP as well. But we already know from Kannan's theorem 1181 that does not have circuits of size n k . 

4 See Miltersen, Vinodchandran, and Watanabe |23| for a discussion of this concept. 

5 A similar bound is implicit in a paper by Stockmeyer and Meyer 1341 . 
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NP queries) does not have circuits of size n k for any k. 6 Conceptually, this would not be so far from showing 
that NP itself does not have circuits of size n k . 7 

Let C be the set of circuits of size n k . In Bshouty et al.'s algorithm, we repeatedly ask the NP oracle 
to find us an input xt such that, among the circuits in C that succeed on all previous inputs x±, . . . , Xt—i, at 
least a 1/3 fraction fail on x t . Since each such input reduces the number of circuits "still in the running" by 
at least a constant factor, this process can continue for at most log \C\ steps. Furthermore, when it ends, by 
assumption we have a set C* of circuits such that for all inputs x, a uniform random circuit drawn from C* 
will succeed on x with probability at least 2/3. So now all we have to do is sample a polynomial number of 
circuits from C* , then generate a new circuit that outputs the majority answer among the sampled circuits. 
The technical part is to express the concepts "at least a 1/3 fraction" and "a uniform random sample" in 
NP. For that Bshouty et al. use pairwise- independent hash functions. 

When we examine the above algorithm, it is far from obvious that adaptive NP queries are necessary. 
For why can't we simply ask the following question in parallel, for all T < log |C|? 

"Do there exist inputs x\, ■ ■ ■ ,Xt, such that at least a 1/3 fraction of circuits in C fail on X\, and 
among the circuits that succeed on Xi, at least a 1/3 fraction fail on x%, and among the circuits 
that succeed on x\ and X2, at least a 1/3 fraction fail on 13, ... and so on up to xt?" 

By making clever use of hashing and approximate counting, perhaps we could control the number of 
circuits that succeed on x\, . . . , Xt for all t < T. In that case, by finding the largest T such that the above 
question returns a positive answer, and then applying the Valiant- Vazirani reduction |38j and other standard 
techniques, we would achieve the desired parallelization of Bshouty et al.'s algorithm. Indeed, when we 
began studying the topic, it seemed entirely likely to us that this was possible. 

Nevertheless, in Section 0] we give an oracle relative to which ZPP|^ P and even BPP|| P have linear-size 
circuits. The overall strategy of our oracle construction is the same as for PP, but the details are somewhat 
less elegant. The existence of this oracle means that any parallelization of Bshouty et al.'s algorithm will 
need to use nonrelativizing techniques. 

Yet even here, the truth is subtler than one might imagine. To explain why, we need to distinguish 
carefully between relativizing and black-box algorithms. An algorithm for learning Boolean circuits is 
relativizing if, when given access to an oracle A, the algorithm can learn circuits that are also given access 
to A. But a nonrelativizing algorithm can still be black-box, in the sense that it learns about the target 
function / only by querying it, and does not exploit any succinct description of / (for example, that / (x) = 1 
if and only if x encodes a satisfiable Boolean formula). Bshouty et al.'s algorithm is both relativizing and 
black-box. What our oracle construction shows is that no relativizing algorithm can learn Boolean circuits 
in BPPj^" 3 . But what about a nonrelativizing yet still black-box algorithm? 

Surprisingly, we show in Section [5] that if P — NP, then there is a black-box algorithm to learn Boolean 
circuits even in P^ p (as well as in NP/log). Despite the outlandishness of the premise, this theorem is not 
trivial, and requires many of the same techniques originally used by Bshouty et al. 8 . One way to interpret 
the theorem is that we cannot show the impossibility of black-box learning in P|| P , without also showing 
that P 7^ NP. By contrast, it is easy to show that black-box learning is impossible in NP, regardless of what 
computational assumptions we make. 8 

These results provide a new perspective on one of the oldest problems in computer science, the circuit 
minimization problem: given a Boolean circuit C, does there exist an equivalent circuit of size at most s? 
Certainly this problem is NP-hard and in Y.^. Also, by using Bshouty et al.'s algorithm, we can find a 
circuit whose size is within an O (n/logn) factor of minimal in ZPP NP . Yet after fifty years of research, 

6 This follows from the same reasoning used by Kobler and Watanabe l-'OI to show that ZPP NP does not have circuits of size 
n k . For such an algorithm would readily imply that if NP C P/poly, then PH collapses to ZPP|J . 

7 For as observed by Shaltiel and Umans 1281 and Fortnow and Klivans 1131 among others, there is an intimate connection 
between the classes P^j p and NP/log. Furthermore, any circuit lower bound for NP/log implies the same lower bound for NP, 
since we can tack the advice onto the input. 

8 Note that by "learn," we always mean "learn exactly" rather than "PAC-learn." Of course, if P = NP, then approximate 
learning of Boolean circuits could be done in polynomial time. 
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almost nothing else is known about the complexity of this problem. For example, is it Zj-complete? Can 
we approximate the minimum circuit size in ZPPj^ p ? 

What our techniques let us say is the following. First, there exists an oracle A such that minimizing 
circuits with oracle access to A is not even approximable in BPPm . Indeed, any probabilistic algorithm 
to distinguish the cases "C is minimal" and "there exists an equivalent circuit for C of size s," using fewer 
than s adaptive NP queries, would have to use nonrelativizing techniques. If one wished, one could take 
this as evidence that the true complexity of the circuit minimization problem should be P NP rather than 
P|| P . On the other hand, one cannot rule out even a "black-box" circuit minimization algorithm (that is, 

an algorithm that treats C itself as an oracle) in Pj^j" 3 , without also showing that P 7^ NP. 

From a learning theory perspective, perhaps what is most interesting about our results is that they show 
a clear tradeoff between two complexities: the complexity of the learner who queries the target function /, 
and the complexity of the resulting computational problem that the learner has to solve. If the learner 
is a ZPP NP machine, then the problem is easy; if the learner is a ZPPj^ p machine, then the problem is 

(probably) hard; and if the learner is an NP^ machine, then there is no computational problem whose 
solution would suffice to learn /. 

1.3 Outlook 

Figurc^shows the "battle map" for nonrelativizing circuit lower bounds that emerges from this paper. The 
figure displays not one but two barriers: a "relativization barrier," below which any Karp-Lipton collapse or 
supcrlincar circuit size lower bound will need to use nonrelativizing techniques; and a "black-box barrier," 
below which black-box learning even of unrelativized circuits is provably impossible. At least for the thirteen 
complexity classes shown in the figure, we now know exactly where to draw these two barriers — something 
that would have been less than obvious a priori (at least to us!). 

To switch metaphors, we can think of the barriers as representing "phase transitions" in the behavior of 
complexity classes. Below the black-box barrier, we cannot learn circuits relative to any oracle A. Between 
the relativization and black-box barriers, we can learn Boolean circuits relative to some oracles A but not 
others. For example, we can learn relative to a PSPACE oracle, since it collapses P and NP, but we cannot 
learn relative to the oracles in this paper, which cause PP and BPPn to have linear-size circuits. Finally, 
above the relativization barrier, we can learn Boolean circuits relative to every oracle A. 9 As we move 
upward from the black-box barrier toward the relativization barrier, we can notice "steam bubbles" starting 
to form, as the assumptions needed for black-box learning shift from implausible (P = NP), to plausible (the 
standard derandomization assumptions that collapse P NP with ZPP NP and PP with BP • PP), and finally to 
no assumptions at all. 

To switch metaphors again, the oracle results have laid before us a rich and detailed landscape, which a 
nonrelativizing Lewis-and-Clark expedition might someday visit more fully. 

2 The Oracle for PP 

In this section we construct an oracle relative to which PP has linear-size circuits. To do so, we will need a 
lemma about multilinear polynomials, which follows from the well-known lower bound of Nisan and Szegedy 
on the approximate degree of the OR function. 

Lemma 1 (Nisan-Szegedy) Letp : {0, 1} — ► M be a real multilinear polynomial of degree at most y/~N /7, 
and suppose that \p(X)\ < | \p (0^) | for all X G {0,1} with Hamming weight 1. Then there exists an 
X 6 {0,1}^ such that \p{X)\ > 6|p(0")|. 

9 There is one important caveat: in S2, we currently only know how to learn self-reducible functions, such as the characteristic 
functions of NP-complete problems. For if the circuits from the two competing provers disagree with each other, then we need 
to know which one to trust. 
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BP • PP 




Figure 1: "Battle map" of some complexity classes between NP and BP • PP, in light of this paper's results. 
Classes that coincide under a plausible derandomization assumption are grouped together with dashed ovals. 
Below the relativization barrier, we must use nonrelativizing techniques to show any Karp-Lipton collapse 
or superlinear circuit size lower bound. Below the black-box barrier, black-box learning of Boolean circuits 
is provably impossible. 



6 



We now prove the main result. 
Theorem 2 There exists an oracle relative to which PP has linear-size circuits. 

Proof. For simplicity, we first give an oracle that works for a specific value of n, and then generalize to 
all n simultaneously. Let Mi,M 2 , ... be an enumeration of PTIME (n logn ) machines. Then it suffices to 
simulate Mi, . . . , M n , for in that case every M, will be simulated on all but finitely many n. 

The oracle A will consist of 2 5 ™ "rows" and n2™ "columns," with each row labeled by a string r e {0, l} 5 ™, 
and each column labeled by a pair (i, x) where i £ {1, . . . , n} and x € {0, 1}™. Then given a triple (r, i, x) 
as input, A will return the bit A(r,i,x). 

We will construct A via an iterative procedure. Initially A is empty (that is, A (r, i, x) = for all r, i, a;). 
Let A t be the state of A after the t th iteration. Also, let Mj jX (A) be a Boolean function that equals I if 
Mj accepts on input a; e {0, 1}™ and oracle string A, and otherwise. Then to encode a row r means to 
set A t (r, i, x) := Mj iX (A_i) for all i, At a high level, our entire procedure will consist of repeating the 
following two steps, for all f > 1: 

(1) Choose a set of rows 5c{0,ir of At-!. 

(2) Encode each r e S, and let A t be the result. 

The problem, of course, is that each time we encode a row r, the Mj jX (-A)'s might change as a result. 
So we need to show that, by carefully implementing step (1), we can guarantee that the following condition 
holds after a finite number of steps. 

(C) There exists an r such that A (r, i, x) — M i;X (A) for all i, x. 

If (C) is satisfied, then clearly Mi, . . . , M„ will have linear-size circuits relative to A, since we can just 
hardwire r into the circuits. 

We will use the following fact, which is immediate from the definition of PP. For all i,x, there exists a 



multilinear polynomial pi tX (A), whose variables are the bits of A, such that: 

(i) If M i;X (A) = 1 then p i;X (A) > 1. 

(ii) If M i;X (A) = then (A) < — 1. 

(iii) pi jX has degree at most n log ™. 

(iv) <2 nl ° g " for all A 

Now for all integers < k < n log ™ and b e {0, 1}, let 



%,x,6, fc {A) = 2 2fc - 3 + (2 fe + (-I) b p 4 , x (A)) 



Then we will use the following polynomial as a progress measure: 



i,x b£{0,l} fc=o 



Notice that 



Since 1/8 < 



deg (Q) < n2" • 2 • (n log " + l) • 2 deg (p i;X ) 
q_i,x.b,k (A) < 5 • 2 2 ™ 1ob " for all i, x, b, fc, we also have 
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for all A. The key claim is the following. 

At any given iteration, suppose there is no r such that, by encoding r, we can satisfy condition (C). 
Then there exists a set S C {0, 1} ™ such that, by encoding each r € S , we can increase Q (A) by at least a 
factor of 2 (that is, ensure that Q (At) > 2Q (A t -\)). 

The above claim readily implies that (C) can be satisfied after a finite number of steps. For, by what 
was said previously, Q (A) can double at most 2 ,l+ °(") times — and once Q (A) can no longer double, by 
assumption we can encode an r that satisfies (C). (As a side note, "running out of rows" is not an issue 
here, since we can re-encode rows that were encoded in previous iterations.) 

We now prove the claim. Call the pair (i,x) sensitive to row r if encoding r would change the value 
of Mi^ x (A). By hypothesis, for every r there exists an (i,x) that is sensitive to r. So by a counting 
argument, there exists a single (i,x) that is sensitive to at least 2 5n / (n2 n ) > 2 3 ™ rows. Fix that (i,x), and 
let n, . . . , r 2 3n be the first 2 3 ™ rows to which (i, x) is sensitive. Also, given a binary string Y — yi . . . y 2 3 ™ , 
let S (Y) be the set of all rj such that yj = 1, and let A^ Y ^ be the oracle obtained by starting from A and 
then encoding each rj 6 S (Y). 

Set b equal to M ix (A), and set k equal to the least integer such that 2 k > \p iiX (A)\. Then we will think of 
Q (A) as the product of two polynomials q (A) and v (A), where q (A) — qi, x ,b,k (-^)) an d v (A) = Q (A) jq (A) 
is the product of all other terms in Q (A). Notice that q (A) > and v (A) > for all A. Also, 

q(A) = 2 2k - 3 + (2 k + (-l) b p hX (A))' 

< 2 2fe-3 + ( 2 fc _ 2 fc-l)2 




Here the second line follows since — 2 fc < (-1) p i>x (A) < -2 k - 1 . On the other hand, for all Y E {0, l} 2 ' 
with Hamming weight 1, we have (— l) b pi yX (A) > 0, and therefore 

q (AW) = 2 2fe - 3 + (2 k + (-l) b PhX (A^)) 2 
> 2 2fe - 3 + (2 k ) 2 

= --2 2k 

8 

>3q(A). 

There are now two cases. The first is that there exists a Y with Hamming weight 1 such that v (A^) > 
ft; (A). In this case 

0(A«) =q(A^)v(A^) 

>Zq{A) 2 ~v{A) 

= 2q{A)v(A) 
= 2Q (A) . 

So we simply set S = S (Y) and are done. 

The second case is that v (AS Y >) < |f (A) for all Y with Hamming weight 1. In this case, we can consider 

2 3 " 

v as a real multilinear polynomial in the bits of Y S {0, 1}~ , of degree at most deg (Q) < v 2 3,l /7. Then 

Lemma Q] implies that there exists a Y G {0, l} 2 such that \v (A^) | = w (^4 < - r - ) ) > Qv (A). Furthermore, 
for all Y we have 

q (A<> Y ">) 2 2fe - 3 _ 1 

q{A) -|~2^ = 3' 
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Hence 

Q = q (^ y >) v (A^) 

> ±q (A) ■ 6v (A) 

= 2q(A)v(A) 
= 2Q (A) . 

So again we can set S = S (Y). This completes the claim. 

All that remains is to handle PTIME (n log ™) machines that could query any bit of the oracle string, 
rather than just the bits corresponding to a specific n. The oracle A will now take as input a list of strings 

5-2* 

R = (n, . . . ,ri), with n £ {0, 1}' for all I, in addition to i, x. Call R an £-secret if A (R, i, x) = Mi^ x {A) 
for all n < 2 £ , i e {1, . . . , n}, and x £ {0, 1}". Then we will try to satisfy the following. 

(C) There exists an infinite list of strings r*, r|, • • ., , such that Rt := (r*, . . . , r|) is an ^-secret for all £ > 1. 

If (C) is satisfied, then clearly each Mi can be simulated by linear-size circuits. For all n> i, simply find 
the smallest £ such that 2 e > n, then hardwire R* t into the circuit for size n. Since £ < 2n, this requires at 
most 5 (2 1 H h 2 l ) < 20n bits. 

To construct an oracle A that satisfies (C), we iterate over all £ > 1. Suppose by induction that R^_ 1 

is an (£ — l)-secret; then we want to ensure that i?J is an ^-secret for some rg € {0, 1} • To do so, we 
use a procedure essentially identical to the one for a specific n. The only difference is this: previously, all 
we needed was a row r € {0, l} 5 ™ such that no (i, x) pair was sensitive to a particular change to r (namely, 
setting A t (r, i, x) :— M itX (A t -i) for all i, x). But in the general case, the "row" labeled by R = (n,...,rt) 
consists of all triples (R',i,x) such that R' = (n, . . . ,r£,r' e+1 , . . . ,r' L ) for some L > £ and r' i+l , . . . ,r' L . 
Furthermore, we do not yet know how later iterations will affect this "row." So we should call a pair 
(i,x) "sensitive" to R, if there is any oracle A' such that (1) A' disagrees with A only in row R, and (2) 
Mi, x (A')^M itX (A). 

Fortunately, this new notion of sensitivity requires no significant change to the proof. Suppose that for 
every row R of the form (r*, . . . , r'g_ 1 , rA there exists an (i, x) that is sensitive to R. Then as before, there 

exists an (i',x') that is sensitive to at least 2 5 ' 2f / ^2 2 ^2 2 * +1 ^ > 2 3n rows of that form. For each of those 

rows R, fix a change to R to which (i' , x') is sensitive. We thereby obtain a polynomial Q (A) with the same 

properties as before — in particular, there exists a string Y e {0, l} 2 such that Q (A^) > 2Q (A), m 
Let us make three remarks about Theorem 

(1) If we care about constants, it is clear that the advice r can be reduced to 3n + o (n) bits for a specific 
n, or I2n + o (n) for all n simultaneously. Presumably these bounds are not tight. 

(2) One can easily extend Thcorcm|5]to give an oracle relative to which PE = PTIME (2°(™)) has linear-size 
circuits, and hence PEXP C P/poly by a padding argument. 

(3) Han, Hemaspaandra, and Thierauf ^B] showed that MA C BPP pat h C PP. So in addition to implying 
the result of Buhrman, Fortnow, and Thierauf that MA has linear-size circuits relative to an oracle, 
Theorem [5] also yields the new result that BPP pat h has linear-size circuits relative to an oracle. 

Another application of our techniques, the construction of relativized worlds where P NP = PEXP and 
0P = PEXP, is outlined in Appendix[S] 

3 Quantum Circuit Lower Bounds 

In this section we show, by a nonrelativizing argument, that PP does not have circuits of size n k , not even 
quantum circuits with quantum advice. We first show that P pp does not have quantum circuits of size n k , 
by a direct diagonalization argument. Our argument will use the following lemma of Aaronson pQ. 
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Lemma 3 ("Almost As Good As New Lemma") Suppose a two-outcome measurement of a mixed quan- 
tum state p yields outcome with probability 1 — e. Then after the measurement, we can recover a state p 
such that \\p — p\\ tI < \J~e. 

(Recall that the trace distance \\p — cr|| tr between two mixed states p and a is the maximum bias with 
which those states can be distinguished via a single measurement. In particular, trace distance satisfies the 
triangle inequality.) 

Theorem 4 P pp does not have quantum circuits of size n k for any fixed k. Furthermore, this holds even if 
the circuits can use quantum advice. 

Proof. For simplicity, let us first explain why P pp does not have classical circuits of size n k . Fix an input 
length n, and let Xx, ■ ■ ■ , %2™ be a lexicographic ordering of n-bit strings. Also, let C be the set of all circuits 
of size n k , and let C t C C be the subset of circuits in C that correctly decide the first t inputs xx, ■ ■ ■ ,x t - 
Then wc define the language L n {0, 1}™ by the following iterative procedure. First, if at least half of the 
circuits in C accept x\, then set x\ <£ L, and otherwise set x\ G L. Next, if at least half of the circuits in 
C\ accept X2, then set X2 4- and otherwise set X2 G L. In general, let N = |~log 2 |C'|] + 1. Then for 
all t < N, if at least half of the circuits in Ct accept Xt+i, then set Xt+i L, and otherwise set Xt+x G L. 
Finally, set x t ^ L for all t > N. 

It is clear that the resulting language L is in P pp . Given an input Xt, we just reject if t > N, and 
otherwise call the PP oracle t times, to decide if xi G L for each i G {1,. ..,<}. Note that, once we know 
xi, . . . ,Xi, we can decide in polynomial time whether a given circuit belongs to Cj, and can therefore decide 
in PP whether the majority of circuits in Ci accept or reject x<+i. On the other hand, our construction 
guarantees that |C t+ i| < \C t \ /2 for all t < N. Therefore |Cjv| < \C\/2 N = 1/2, which means that C N is 
empty, and hence no circuit in C correctly decides x\, . . . ,xn ■ 

The above argument extends naturally to quantum circuits. Let C be the set of all quantum circuits of 
size n k , over a basis of (say) Hadamard and Toffoli gates. 10 (Note that these circuits need not be bounded- 
error.) Then the first step is to amplify each circuit C G C a polynomial number times, so that if C's initial 
error probability was at most 1/3, then its new error probability is at most (say) 2 -10 ™. Let C be the 
resulting set of amplified circuits. Now let \rpo) be a uniform superposition over all descriptions of circuits 
in C', together with an "answer register" that is initially set to |0): 

V l c I cec 

For each input xt G {0, l} n , let Ut be a unitary transformation that maps |C) |0) to |C) \C (xt)) for each 
C G C, where \C (x t )) is the output of C on input Xt- (In general, |C (xt)) will be a superposition of |0) 
and |1).) To implement Ut, wc simply simulate running C oni t , and then run the simulation in reverse to 
uncompute garbage qubits. 

Let N = [log 2 \C'\\ + 2. Also, given an input x t , let L (x t ) = 1 if Xt G L and L (x t ) — otherwise. Fix 
t < N, and suppose by induction that we have already set L (a^) for all i < t. Then we will use the following 
quantum algorithm, called At, to set L (xt+x). 

Set |V>) := |Vo> 
For i := 1 to t 

Set \<»P) := Ui \ip) 

Measure the answer register 

If the measurement outcome is not L(xi), then FAIL 
Next i 

Set |V>) := U t +x \1>) 

Measure the answer register 

10 Shi 1311 showed that this basis is universal. Any finite, universal set of gates with rational amplitudes would work equally 
well. 
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Say that At succeeds if it outputs L (xi) for all x\, . . . ,Xt- Conditioned on At succeeding, if the final 
measurement yields the outcome |1) with probability at least 1/2, then set L (xt+i) := 0, and otherwise set 
L {xt+i) '.= 1. Finally, set L (x t ) := for all t > N. 

By a simple extension of the result BQP C PP due to Adleman, DeMarrais, and Huang [3], Aaronson 
[2] showed that polynomial-time quantum computation with postselected measurement can be simulated in 
PP (indeed the two are equivalent; that is, PostBQP = PP). In particular, a PP machine can simulate the 
postselected quantum algorithm At above, and thereby decide whether the final measurement will yield |0) or 
|1) with greater probability, conditioned on all previous measurements having yielded the correct outcomes. 
It follows that L e P pp . 

On the other hand, suppose by way of contradiction that there exists a quantum circuit C € C that 
outputs L (xt) with probability at least 1 — 2~ 10 " for all t. Then the probability that C succeeds on 
X\, . . . ,xn simultaneously is at least (say) 0.9, by Lemma |21 together with the triangle inequality. Hence 
the probability that At succeeds on xi, . . . ,xn is at least 0.9/ \C'\. Yet by construction, At succeeds with 
probability at most 1/2', which is less than 0.9/ \C'\ when t = N — 1. This yields the desired contradiction. 

Finally, to incorporate quantum advice of size s — n k , all we need to do is add an s-qubit "quantum 
advice register" to \ipo)> which Ut's can use when simulating the circuits. We initialize this advice register 
to the maximally mixed state on s qubits. The key fact (see PP for example) is that, whatever the "true" 
advice state \<j>), we can decompose the maximally mixed state into 

3=1 

where \4>\) , . . . , \4>2 s ) form an orthonormal basis and \<j>\) = \<f>). By linearity, we can then track the evolution 
of each of these 2 s components independently. So the previous argument goes through as before, if we set 
N = [log 2 \C'\\ + s + 2. (Note that we are assuming the advice states are suitably amplified, which increases 
the running time of At by at most a polynomial factor.) ■ 

Similarly, for all time-constructible functions / (n) < 2™, one can show that the class DTI ME (/ (n)) PP 
does not have quantum circuits of size / (n) /n 2 . So for example, E pp requires quantum circuits of exponential 
size. 

Having shown a quantum circuit lower bound for P pp , we now bootstrap our way down to PP. To do so, 

we use the following "quantum Karp-Lipton theorem." Here BQP/poly is BQP with polynomial-size classical 

advice, BQP/qpoly is BQP with polynomial-size quantum advice, QMA is like MA but with quantum verifiers 

and quantum witnesses, and QCMA is like MA but with quantum verifiers and classical witnesses. Also, 

pp pp pp 

recall that the counting hierarchy CH is the union of PP, PP , PP , and so on. 

Theorem 5 If PP C BQP/poly then QCMA = PP, and indeed CH collapses to QCMA. Likewise, if 
PP C BQP/qpoly then CH collapses to QMA. 

Proof. Let L be a language in CH. It is clear that we could decide L in quantum polynomial time, if we 
were given polynomial-size quantum circuits for a PP-complete language such as MajSat. For Fortnow and 
Rogers ^2] showed that BQP is "low" for PP; that is, PP BC!P = PP. So we could use the quantum circuits 
for MajSat to collapse PP PP to PP BQP = PP to BQP, and similarly for all higher levels of CH. 

Assume PP C BQP/poly; then clearly P# p = P pp is contained in BQP/poly as well. So in QCMA we 
can do the following: first guess a bounded-error quantum circuit C for computing the permanent of a 
poly (n) x poly (n) matrix over a finite field F p , for some prime p = O (poly (n)). (For convenience, here 
poly (n) means "a sufficiently large polynomial depending on L.") Then verify that with 1 — o(l) probability, 
C works on at least a 1 — 1/ poly (n) fraction of matrices. To do so, simply simulate the interactive protocol 
for the permanent due to Lund, Fortnow, Karloff, and Nisan P2, but with C in place of the prover. Next, 
use the random self-reducibility of the permanent to generate a new circuit C that, with 1 — o (1) probability, 
is correct on every poly (n) x poly(n) matrix over F p . Since Permanent is #P-complete over all fields of 
characteristic p ^ 2 [37], we can then use C to decide MajSat instances of size poly (n), and therefore the 
language L as well. 
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The case PP C BQP/qpoly is essentially identical, except that in QMA we guess a quantum circuit with 
quantum advice. That quantum advice states cannot be reused indefinitely does not present a problem 
here: we simply guess a boosted circuit, or else poly (n) copies of the original circuit. ■ 

By combining Theorems 0] and we immediately obtain the following. 

Corollary 6 PP does not have quantum circuits of size n k for any fixed k, not even quantum circuits with 
quantum advice. 

Proof. Suppose by contradiction that PP had such circuits. Then certainly PP C BQP/qpoly, so QMA = 
PP = P pp = CH by Theorem But P pp does not have such circuits by Theorem and therefore neither 
does PP. ■ 

More generally, for all / (n) < 2™ we find that PTIME (/ (/ (n))) requires quantum circuits of size ap- 
proximately / (n). For example, PEXP requires quantum circuits of "half-exponential" size. 

Finally, we point out a quantum analogue of Buhrman, Fortnow, and Thierauf 's classical nonrelativizing 
separation |10|. 

Theorem 7 QCMA EXP (£ BQP/poly, and QMA EXP <t BQP/qpoly. 

Proof. Suppose by contradiction that QCMAexp C BQP/poly. Then clearly EXP C BQP/poly as well. 
Babai, Fortnow, and Lund p. showed that any language in EXP has a two-prover interactive protocol where 
the provers are in EXP. We can simulate such a protocol in QCMA as follows: first guess (suitably amplified) 
BQP/poly circuits computing the provers' strategies. Then simulate the provers and verifier, and accept if 
and only if the verifier accepts. It follows that EXP = QCMA, and therefore QCMA = P pp as well. So by 
padding, QCMA EXP = EXP PP . But we know from Theorem H that EXP PP <£ BQP/poly, which yields the 
desired contradiction. The proof that QMAexp <t- BQP/qpoly is essentially identical, except that we guess 
quantum circuits with quantum advice. ■ 

One can strengthen Theorem[7|to show that QMA EX p requires quantum circuits of half-exponential size. 
However, in contrast to the case for PEXP, here the bound does not scale down to QMA. Indeed, it turns 
out that the smallest / for which we get any superlinear circuit size lower bound for QMATIME (/ (n)) is 
itself half-exponential. 



4 The Oracle for BPP|f K 

In this section we construct an oracle relative to which BPPj^ p has linear-size circuits. 
Theorem 8 There exists an oracle relative to which BPPj^ p has linear-size circuits. 

Proof. As in TheoremEl we first give an oracle A that works for a specific value of n. Let Mi, M2, ... be an 
enumeration of "syntactic" BPTIME (n logn ) .. machines, where syntactic means not necessarily satisfying 
the promise. Then it suffices to simulate Mi, . . . , M n . We assume without loss of generality that only the 
NP oracle (not the Mi's themselves) query A, and that each NP call is actually an NTIME(n) call (so in 
particular, it involves at most n log ™ queries to A). Let Mi :X:Z (A) be a Boolean function that equals 1 if 

Mi accepts on input x G {0, 1}™, random string z G {0, 1}" , and oracle A, and otherwise. Then let 
Pi :X (A) := EX Z [Mi :X:Z (A)] be the probability that Mi accepts x. 

The oracle A will consist of 2 3n rows and n2 n columns, with each row labeled by r G {0, 1} ™, and each 
column labeled by an (i, x) pair where i G {1, . . . , n} and x G {0, 1}". We will construct A via an iterative 
procedure V. Initially A is empty (that is, A(r,i,x) = for all r,i,x). Let A t be the state of A after 
the t th iteration. Then to encode a row r means to set A t (r,i,x) := round (pi, x (At-i)) for all i, x, where 
round (p) = 1 if p > 1/2 and round (p) = if p < 1/2. 

Call an (i, x) pair sensitive to row r, if encoding r would change pi >x (A) by at least 1/6. Then V consists 
entirely of repeating the following two steps, for t = 1,2,3...: 
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(1) If there exists an r to which no (i, x) is sensitive, then encode r and halt. 

(2) Otherwise, by a counting argument, there exists a pair (j, y) that is sensitive to at least N = 2 3 ™/ (n2 n ) 
rows, call them n, . . . , rjy. Let A^ be the oracle obtained by starting from A and then encoding r^. 
Choose an integer k £ {1, . . . , N} (we will specify how later), and set A t := A\) k \. 

Suppose V halts after t iterations, and let r be the row encoded by step (1). Then by assumption, 
\Pi,x {At) — Pi, x {At-i)\ < 1/6 for all i,x. So in particular, if pi iX {A t ) > 2/3 then pi iX {A t -i) > 1/2 and 
therefore A t {r,i,x) = 1. Likewise, if Pi, x (At) < 1/3 then pi iX {A t -i) < 1/2 and therefore A t {r 7 i,x) = 0. 
It follows that any valid BPTIME (n log ™) , machine in {M\, . . . , M n } has linear-size circuits relative to 

A t — since we can just hardwire r £ {0, l} 2 " into the circuits. 

It remains only to show that V halts after a finite number of steps, for some choice of k's. Given an 
input x, random string z, and oracle A, let Si yXyZ {A) be the set of NP queries made by Mj that accept. Then 
we will use 

W{A) :=^EX[|5 W {A)\} 



as our progress measure. Since each Mi can query the N P oracle at most n log " times, clearly < \Si tX>z (A) \ < 
n log ™ for all i,x,z, and therefore 

< W{A) < nT ■ n log " 

for all A. On the other hand, we claim that whenever step (2) is executed, if k £ {1,.. . , N} is chosen 
uniformly at random then 

EX \w (A (fe) )] > W{A) + i - 2- n+o{n) . 

So in step (2), we should simply choose k to maximize W{A^). For we will then have W (A t ) > 
(1/6 - 2-"+°( ri )) t for all t, from which it follows that V halts after at most 



n2 n . ^logn 

1/6 - 2- n +°( n ) 



2^+o(n) 



iterations. 

We now prove the claim. Observe that for each accepting NP query q £ S{, x , z (A), there arc at most 
n log ™ rows rfc such that encoding ru would cause q </ Si, XfZ {A^). For to change q''s output from 'accept' 
to 'reject,' we would have to eliminate (say) the lexicographically first accepting path of the NP oracle, and 
that path can depend on at most n log ™ rows of A. Hence by the union bound, for all i, x, z, A we have 



Pr 

k 



Si, x , x (A) £ St, x , z (A^ 



< 



E 



< \Si, x ,,(A)\ 

n 2 log n 



Pr 

k 



q i s w (a^ 



< 



2 3 "/(n2«; 

2~ 2n+o(n) 



jlog n 

N 



So in particular, for all i, x, A, 



EX 

k.z 



i.x.z (^A^ ^ 



> \Si, x , z (4)| - Pr 

r 

> \Si, x , z {A)\ ( 



k,z 

i - 



Si 



c, z (A^ 
n+o(n)~j 



> \Si, x , z {A)\ 
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On the other hand, by assumption there exists a pair (j, y) that is sensitive to row Tk for every k € 
{l,...,iV}. Furthermore, given y and z, the output Mj yg (A) of Mj is a function of the NP oracle 
responses Sj lV>z (A), and can change only if Sj t y tZ (A) changes. Therefore 



Pr 

k.z 



Si 



(A {k) ) + S jtV>z (A) 



> Pr 

k.z 



Mi 



(aw) + 



Mi 



(A) 



> 



So by the union bound, 



Pr 

k.z 



> is* 



04) I 



> Pr 

k,z 

1 

> - 



Sj, y ,z [A^J ^ Sj,y,z (A) — Pr Sj tV>z (A) <f_ Sj^. z yA^^j 

2~ 2n+o(n) 



Putting it all together, 



EX 

k 



w (aw 



> 



Eex[|q w (^))| 

2 -2n+o(n) + J2 \S i<XiZ (A)\ (l - 2- 2 ™+°(")) 

i,x 

I _ 2~ 2n +°( n ) _(_ _ 2~ 2n +°( n )^ ^) 



1 

6 " 



G 

W(A) 



G 



- 2' 



i-\-o(n) 



which completes the claim. 

To handle all values of n simultaneously, we use exactly the same trick as in Theorem That is, we 



3-2 

replace r by an ^-tuple R = (ri,. .. ,r£) where re G {0, 1} ; define the "row" TZe to consist of all triples 



{R' Ll i, x) such that L > £ and r' h — rh for all h < I; and call the pair 



"sensitive" to row TZe if there is any 



oracle A' that disagrees with A only in TZe , such that \pi, x {A') — pi^ x (A)\ > 1/6. We then run the procedure 
V repeatedly to encode r\,r2, ■ . ., where "encoding" re means setting A t (Re,i,x) := round (j>i >x {At-i)) for 
all n < 2 e , i S {1, . . . , n}, and x € {0, 1}™. The rest of the proof goes through as before. ■ 
Let us make six remarks about Theorem [5] 



(1) An immediate corollary is that any Karp-Lipton collapse to BPPm P would require nonrelativizing 



techniques. For relative to the oracle A from the theorem, we have NP C BPPn P C P/poly. On the 



other hand, if PhT = BPP^ , then BPPj| r would not have linear -size circuits by Kannan's Theorem 
|18j (which relativizes), thereby yielding a contradiction. 

(2) If we care about constants, we can reduce the advice r to 2n + o (n) bits for a specific n, or 8n + o (ri) 
for all n simultaneously 

(3) As with Theorem |2 one can easily modify Theorem [H] to give a relativized world where BPEXP||' P C 
P/ poly. Thus, Theorem [5] provides an alternate generalization of the result of Buhrman, Fortnow, and 
Thierauf ^HI that MAexp C P/poly relative to an oracle. 

(4) Since BPP pat h C BPPj^ p (as is not hard to show using approximate counting), Theorem [5] also provides 
an alternate proof that BPP pat h has linear-size circuits relative to an oracle. 

(5) Completely analogously to Theorem^] one can modify Theorem [5] to give oracles relative to which 
P NP = BPEXP[J P and eP = BPEXP^ P . 

(6) For any function /, the construction of Theorem[H]actually yields an oracle relative to which BPP NP ^*- n '" 
(that is, BPP with / (n) adaptive NP queries) has circuits of size 0(n + f(n)). For clearly we can 
simulate / (n) adaptive queries using 2^ n ) nonadaptive queries. We then repeat Theorem [5] with the 
bound < W (A) < n2 n ■ 2^ n \ 



■>NP 
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5 Black-Box Learning in Algorithmica 



"Algorithmica" is one of Impagliazzo's five possible worlds ^7], the world in which P = NP. In this section 
we show that in Algorithmica, black-box learning of Boolean circuits is possible in P^. Let us first define 
what we mean by black-box learning. 

Definition 9 Say that black-box learning is possible in a complexity class C if the following holds. There 
exists a C machine M such that, for all Boolean functions f : {0, l} n — > {0, 1} with circuit complexity at 
most s (n), the machine outputs a circuit for f given ^0",0 S< -' 1 ') as input. Also, M has approximation 
ratio a (n) if for all f, any circuit output by M has size at most s (n) a (n). 

The above definition is admittedly somewhat vague, but for most natural complexity classes C it is clear 
how to make it precise. Firstly, by "C machine" we really mean "TC machine," where J-C is the function 
version of C. Secondly, for semantic classes, we do not care if the machine violates the promise on inputs 
not of the form (0™, S (™)), or oracles / that do not have circuit complexity at most s (n). Let us give a few 
examples. 

• Almost by definition, black-box learning is possible in Y? 2 with approximation ratio 1. 

• As pointed out by Umans [201, the result of Bshouty et al. |S] implies that black-box learning is possible 
in ZPP NP , with approximation ratio O (ro/logn). 

• Under standard derandomization assumptions, black-box learning is possible in P NP with approxima- 
tion ratio O (n/ logn), and in PP with approximation ratio 1. For not only do these assumptions imply 
that ZPP NP = P NP and that BP • PP = PP, but they also yield a black-box simulation of a ZPP NP or 
BP • PP algorithm that learns a circuit for / by just querying an existing circuit C on various inputs 
(without "cheating" and looking at C). 

On the other hand: 

Proposition 10 Black-box learning is impossible in NP, or for that matter in AM, IP, or MIP. 

Proof. Suppose there are two possibilities: either / is the identically zero function, or else / is a point 
function (that is, there exists a y such that / (x) = 1 if and only if x = y). In both cases s (n) = O (n). 
But since the verifier has only oracle access to /, it is obvious that no polynomially-bounded sequence of 
messages from the prover(s) could convince the verifier that / is identically zero. We omit the details, which 
were worked out by Fortnow and Sipser |15| . ■ 
We now prove the main result. 

Theorem 11 If P = NP, then black-box learning is possible in P^ p (indeed, with approximation ratio 1.) 

Proof. We use a procedure inspired by that of Bshouty et al. [B]- 

Fix n, and suppose / : {0, 1}" — > {0, 1} has circuits of size s — s (n). Let B be the set of all circuits of 
size s, so that \B\ — s 0< - s \ Also, say that a circuit C € B succeeds on input x <E {0, 1}" if C (x) — f (x), 
and fails otherwise. Then given a list of inputs X = (x%, x%, . . .), let B (X) be the set of circuits in B that 
succeed on every x £ X. 

For the remainder of the proof, let X t — (x±, . . . ,x t ) be a list of t inputs, and for all < i < t, let 
Xi = (x\, . . . , Xi) be the prefix of X t consisting of the first i inputs (so in particular, X$ is the empty list). 
Then our first claim is that there exists an NP-^ machine Qt with the following behavior: 

• If there exists an X t such that \B (Xi)\ < | \B (Xi-\)\ for all i 6 {1, . . . , t}, then Q t accepts. 

• If for all X t there exists an i g {1, . . . , t} such that \B (Xi)\ > | \B (Xj_x)|, then Q t rejects. 
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(As usual, if neither of the two stated conditions hold, then the machine can behave arbitrarily.) 

In what follows, we can assume without loss of generality that t is polynomially bounded. For, since 
some circuit C € B succeeds on every input, we must have |SpQ)| > 1 for all i. Therefore Qt can accept 
only if \B\ (3/4)' > 1, or equivalently if t = O (slogs). 

Let / (X t ) :— (f (xi) ,...,/ (x t )), and let z be a "witness string" consisting of X t and / (X t ). Then 
given z and i < t, we can easily decide whether a circuit C belongs to the set B (Xi): we simply check 
whether C (xj) = f (xj) for all j < i. So by standard results on approximate counting due to Stockmeyer 
|3*3] and Sipser |32|. we can approximate the cardinality |£> (XJ| in BPP NP . More precisely, for all t,i there 
exists a PromiseBPP NP machine M t .i such that for all z — (X t , f (X t )): 

• If |2?(Xi)| < | |S(Xi_i)| then M ti i (z) accepts with probability at least 2/3 (where the probability is 
over M t /s internal randomness). 

• If \B{Xi)\ > | \B(Xi-i)\ then M u (z) rejects with probability least 2/3. 

Now by the Sipser-Lautemann Theorem |33 |^, the assumption P = NP implies that PromiseP = 
PromiseBPP NP as well. So we can convert M t ^ into a deterministic polynomial-time machine M' ti such that 
for all z: 

• If \B(Xi)\ < | \B{Xi-i)\ then M' t i (z) accepts. 

• If \B{X l )\ > | |B(Xi_i)| then M' t l (z) rejects. 
Using M' til we can then rewrite Qt as follows. 

"Does there exist a witness z, of the form (Xt, f (X t )), such that M[ x (z) A • • • A M' t t (z)?" 
This proves the claim, since the above query is clearly in NP-^. 

To complete the theorem, we will need one other predicate A t (z,x), with the following behavior. For 
a]i z = (X t , f (X t )) and x e {0,1}": 

• If P?ceB(x t ) [C i x ) — 1] > 2/3 then A t (z,x) accepts. 

• If Pr CeB(Xt) [C (x) = 0] > 2/3 then A t (z,x) rejects. 

It is clear that we can implement A t in PromiseBPP NP , again because of approximate counting and the 
ease of deciding membership in B (X t ). So by the assumption P = NP, we can also implement A t in P. 

Now let Ct, z be the lexicographically first circuit C £ B such that C (x) = A t (z,x) for all x £ {0, l} n . 
Notice that A t (z, x) is an explicit procedure: that is, we can evaluate it without recourse to the oracle for /. 
So given z, we can find Ct jZ in A3 = p NpNP j and hence also in P. 

Let t* be the maximum t for which Q t accepts, and let z = (X t *, f (X t *)} be any accepting witness for 
Q t ,. Then for all x € {0, 1}™, we have 

Pr [C(x)= f(x)]>-. 

For otherwise the sequence (x\, . . . , Xf , x) would satisfy Qt*+\, thereby contradicting the maximality of t* . 
An immediate corollary is that A t * (z,x) = f (x) for all x S {0,1}™. Hence Cf. z is the lexicographically 
first circuit for /, independently of the particular accepting witness z. 

The P|| P/ learning algorithm now follows easily. For all t = O (slogs), the algorithm submits the query 
Qt to the NP oracle. It also submits the following query, called Rt.j, for all i = O (s log s) and j = O (s log s): 

"Does there exist a witness z = (X t , f (X t )) satisfying Q tl such that the j th bit in the description of Ct jZ 
is a 1?" 

Using the responses to the Qt's, the algorithm then determines t*. Finally it reads a description of C t « jZ 
off the responses to the Rt* , 's. ■ 
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Theorem^lhas the following easy corollaries. First, we cannot show that a Karp-Lipton collapse to P|| P 
would require non-black-box techniques, without also showing P ^ NP. Second, if P = NP, then black-box 
learning is possible in NP/log. For since the Pj^ p algorithm of Theorem 1111 does not take any input, we 
simply count how many of its NP queries return a positive answer, and then feed that number as advice to 
the NP/log machine. 

6 Open Problems 

The main open problem is, of course, to prove better nonrelativizing lower bounds. For example, can we 
show that BPPj^j p does not have linear-size circuits? To do so, we would presumably need a nonrelativizing 
technique that applies directly to the polynomial hierarchy, without requiring the full strength of #P. Arora, 
Impagliazzo, and Vazirani 0] argue that "local checkability," as used for example in the PCP Theorem, 
constitutes such a technique (though see Fortnow 12 for a contrary view). For us, the relevant question 
now is not which techniques are "truly" nonrelativizing, but simply which ones lead to lower bounds! 
Here are a few other problems. 

• Can we show that P NP ^ PEXP? If so, then we would obtain perhaps the first nonrelativizing 
separation of uniform complexity classes that does not follow immediately from a collapse such as 
IP = PSPACE or MIP = NEXP. 

• Can we show that PEXP requires circuits of exponential size, rather than just half-exponential? 

• As mentioned in Section ll. 21 Bshouty et al.'s algorithm does not find a minimal circuit for a Boolean 
function /, but only a circuit within an 0(n/logro) factor of minimal. 11 Can we improve this 
approximation ratio, or alternatively, show that doing so would require nonrelativizing techniques? 

• Is black-box learning possible in Pj^ p or ZPP^, under some computational assumption that we actually 
believe (for example, a derandomization assumption)? Alternatively, can we show that black-box 
learning is impossible in P^ under some plausible computational assumption? 
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8 Appendix: A Really Big Crunch 

By slightly modifying the construction of Theorem [21 we can resolve two other open questions of Fortnow. 
Theorem 12 

(i) There exists an oracle relative to which P NP = PEXP ; and indeed P NP = P NP 
(ii) There exists an oracle relative to which ®P = PEXP. 
Proof. 

(i) In the oracle construction of Theorem [5] dealing with all n simultaneously, make the following simple 
change. Whenever a row R gets encoded, record the "current time" i as a prefix to that row. In 
other words, the oracle A will now take two kinds of queries: those of the form (R, i, x) as before, and 
those of the form (R,j) for an integer j > 0. Initially A(R, j) = for all R, j. At any step of the 
iterative procedure, let t be the number of encoding steps that have already occurred. Then call the 
pair (i, x) "sensitive" to row R, if there exists an oracle A 1 such that 
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• A! disagrees with A only in row R, 

• M i>x (A')^M i>x {A), and 

• as we range over j, the A' (R,j)'s encode the binary expansion of t + 1. 

Clearly the proof of Theorem [21 still goes through with this change. For let I — |~log 2 n\ . Then 
as before, whenever there does not exist a row R of the form fr*, . . . ^^^rn to which no (i,x) is 

sensitive, we can encode a subset of those rows so as to double Q (A). Since 2~ 2 ° (n) < Q (A) < 2 2 ° < " ) 
for all A, this process will halt after at most 2°^ n ' steps, meaning that t will never require more than 
O (n) bits to represent. Indeed, this is true even if we are dealing with PTIME (2 n ) machines, rather 
than PTIME (?i log ") machines. 

Now consider a PTIME" 4 (2") machine Mj. We can simulate Mj in DTIME (n 2 ) NP , as follows. Given 
an input x S {0, 1}™, first find the unique row R = (ri, . . . , 7"[iog 2 n~|) for which t is maximal — in other 
words, the last such row to have been encoded. This requires O (n) adaptive queries to the NP oracle, 
each of size O (n). Then output A (R, i, x). 

It follows that DTIME (ri 2 ) NP = PE relative to A, and (by padding) that P NP = PEXP. Indeed, once 
the P NP machine finds the rg's, it can use them to decide an arbitrary language in P NP , which is 
why P NP = P NpPEXP as well. 

(ii) In this case the change to Theorem [2] is even simpler. Whenever we encode a row R = (ri, . . . ,ri), 
instead of setting At (R,i,x) := Mi. x (At-i) for all i,x, we now set 

A t (R, i, x) := Mi, x (At-i) 8 A t (R', i, x) , 

R'^R 

where the sum mod 2 ranges over all R' = (r' l5 . . . , r'g) other than R itself. Then when we are done, 
by assumption A will satisfy 

M itX (A)= A(R,i,x) 

R=(n,...,ri) 

for all n < 2 e , i G and x G {0,l} n . So to simulate a PE machine Mj on input x, 

a ©DTIME (n) machine just needs to return the above sum. Hence ©DTIME A (n) = PE A , and 
©P A = PEXP A by padding. 
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